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SUMMARY 

The complete nucleotide sequence of beet necrotic yellow vein virus RNA-1 is 
presented. The RN A molecule is 6746 nucleotides long excluding the poly(A) tail and 
has one long open reading frame encoding a polypeptide of M r 237 389. The 3' terminal 
60 residues of BNYVV RNA-1 display extensive sequence homology with the 
corresponding portions of BNYVV RNA-2, -3 and -4. Additional 3' terminal homology 
exists between RNA-1 and -2. The sequence of the M r 237389 RNA-l-encoded 
polypeptide shares domains of amino acid homology with polypeptides thought to be 
involved in replication of RNA from tobacco mosaic virus and several other viruses. 
Amino acid sequence homologies between two open reading frames ofBNYVVRNA- 
2 and two frames of RNA-/? from barley stripe mosaic virus have also been detected. 

INTRODUCTION 

Beet necrotic yellow vein virus (BNYVV) is a muiticomponent soil-borne rod-shaped virus 
responsible for a severe disease of sugarbeet called rhizomania (Tamada, 1 975). Field isolates of 
BNYVV typically contain four single-stranded 5'-capped and 3'-polyadenyIated plus-sense 
RNA molecules (Putz, 1977; PuizetaL, 1983). BNYVV RNA-2 [4612 nucleotides, excluding the 
poly(A) tail], RNA-3 (1774 nucleotides) and RNA-4 (1467 nucleotides) have already been 
sequenced (Bouzoubaa era/., 1985, 1986). In this paper, we report the complete sequence of 6746 
nucleotides for RNA-1, thus completing the molecular description of the BNYVV genome. 
Similarities of sequence and genetic organization between BNYVV and other RNA viruses are 
described. 

METHODS 

cDNA synthesis and cloning. RNA- 1 -specific cDNA clone pBF5 was prepared from RNA of BNYVV isolate F2 
as described previously (Richards et aL, 1985). Other RNA-1 -specific cDNA clones were prepared with RNA 
from isolate F 1 3, a subisolate of F2 (Ziegler et aL, 1 985). In this case the primers for first-strand cDN A synthesis 
were synthetic oligodeoxyribonucleotide primers 1 5 to 22 residues in length complementary to portions of RNA-1 
close to the 5' limit of the known sequence. The oligodeoxyribonucleotides were synthesized with an Applied 
* Biosystems 381 A apparatus and purified by polyacrylamide gel electrophoresis. Clones pBF5I and pBF5201 were 

obtained by directly cloning dC-tailed RN A-cDNA hybrids into Ps/Minearized dG-tailed pUC9 (Bouzoubaa et 
aL, 1986). Clones pBF5220, pBF541 and pBF542 were obtained by treating the hybrid with RNase H (Bethcsda 
Research Laboratories) and DNA polymerase I (Boehringer) (Okayama & Berg, 1982; Gubler & Hoffman, 1983) 
prior to dC-tailing and cloning. For clone pBF553 second-strand synthesis was primed by a synthetic 
oligodeoxyribonucleotide corresponding to nucleotides I to 23 of RNA-1. 

Recombinant clones were screened for targe inserts by restriction enzyme analysis and the selected clones were 
screened for RNA-1 -specific sequences by Northern hybridization to viral RNA using nick-translated plasmid 
DNA from candidate clones as probe (Bouzoubaa et al. t 1985). This test proved necessary to eliminate clones 
containing RNA-2 and -3 specific cDNA which occurred frequently in spite of the fact that care was taken to 
choose primer sequences with a minimum of complementarity with the known RNA-2 and -3 sequences. 

Sequence analysis. cDNA inserts were sequenced by partial chemical degradation (Maxam & Gilbert, 1977) 
after digestion with appropriate restriction enzymes and ja P end-labelling or by subcloning restriction fragments 
into Ml3mpl0 or MUmpll (Messing, 1983) for sequencing by the dideoxyribonucleotide chain termination 
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method (Sanger et qL> 1977). Sequence results were also obtained by primer extension of 5'- J2 P-label)ed synthetic 
primers hybridized to RNA-1 and analysis of the 5'-labelled cDNA transcript by partial chemical degradation 
(Bouzoubaa et aL 1986). The sequence was determined on both strands over about 75% of its total length. 

Data were manipulated with UWGCG programs (Devereux etai t 1984) on a VAX i 1/750 minicomputer. DNA 
and amino acid sequence homologies were compared by the matrix method (Maizel & Lenk, 1981) using the 
programs COMPARE and DOTPLOT. Amino acid homology scoring was with the modified MDM78 matrix 
(Staden, 1982) normalized by multiplication by 01. Stringencies of 32-0 to 33 0 were used for a 30 amino acid 
window. Homologous regions were aligned with the program BESTFIT using a direct identity matrix for DNA 
comparisons (gap penalty 5, gap length penalty 0-3) and the log odds matrix for 250. accepted point mutations 
(normalized by multiplication by O*25)of Dayhoff et at. ( 1 978) for polypeptides. In the fatter case a gap penalty of 4 
and gap length penalty of 015 were used. Quality was calculated as number of matches - (0-05 x number 
mismatches) - (gap penalty x number of gaps) - (gap length penalty x total length of gaps). All scores greater 
than or equal to 0-5 in the normalized log odds matrix were counted as matches. 



RESULTS AND DISCUSSION 

. _ Sequence analysis 

Fig. l-presems a map bf-the insert positions for the seven cDNAxiones used to anatyse'the " 

BNYVV RNA 1 sequence. Clone pBF5 was obtained previously (Richards et al. t 1985) by the 
method of Heidecker & Messing (1983) with Pstl -linearized dT-taiied pUC9 as the first-strand 
primer during cDN A synthesis. The pBF5 insert has a poly(A) tail of about 100 nucleotides (nt) 
at one extremity, thus demonstrating that the insert extends to the 3' terminus of the RNA 
molecule and defining the orientation of the sequence. Once the sequence of "jpBFS was 
completed an oligodeoxy ribonucleotide of 15 nt (primer 1, Fig. 1) complementary to the 
sequence near its 5' extremity was synthesized and used to prime further cDNA synthesis, 
yielding clones pBF5220 and pBF51 which were in turn sequenced. The sequence was used as a 
guide for synthesis of a second* primer (primer 2) and so on. Unfortunately, the longest cDNA 
inserts obtained by this procedure were only about 1200 nt in length so that it had to be carried 
out three times before obtaining pBF54 1 (Fig. 1 ), which extends to within 594 nt of the 5' end (all 
numbering refers to the final sequence). 

Initial attempts to clone cDN A covering the final 600 nt were unsuccessful. Therefore, this 
portion of the sequence was characterized by direct analysis of cDNA synthesized by reverse 
transcription from appropriate 32 P-labelled synthetic primers. About 300 residues of sequence 
could be read with confidence in this fashion so that three cycles of primer synthesis/cDNA 
sequence analysis, using primers 5, 6 and 7 in Fig. 1, were necessary to span the gap. The 
extreme 5' terminal sequence was obtained using end-labelled cDN A extended from a primer 
complementary to nt 109 to 129 (primer 7 in Fig. 1). This cDN A sequence could be read with 
confidence up to and including the nucleotide complementary to the first residue of the RNA 
molecule (Bouzoubaa et al.> 1986). Finally, a primer identical in sequence to nt 1 to 23 of the 
RNA was synthesized and used to prime second-strand synthesis upon cDN A synthesized from 
primer 5. This cDNA was cloned (pBF553, Fig. 1) and analysis of its sequence confirmed the 
results provided by primer extension. 

The complete sequence of BNYVV RNA-1 is 6746 residues in length (Fig. 2) which, assuming 
poly(A) tails of approx. 1 00 nt (Putz et al., 1983), is within 5% of the length of 7 100 nt estimated 
for methylmercury(II) hydroxide-denatured RNA-1 by agarose gel electrophoresis (Richards et 
a/., 1985). 

Homology among the four BNYVV RNA species 
Homology among the four BNYVV RNA sequences is limited to their extremities. At their 5' 
extremities all four molecules begin with short runs of A, A 3 in the case of RNA-1, -2 and -4 and 
A 4 for RN A-3 (Fig. 3). These sequences could form part of the promoter for second-strand RNA 
synthesis. G residues are significantly under-represented in the first 30 or 50 residues of all four 
BNYVV RNA, as is also the case for the 5' non-coding regions of several other plant viral RNA, 
e.g. tobacco mosaic virus (TMV) (Richards et a/., 1978), alfalfa mosaic virus (AMV) RNA-4 
(Koper-Zwarthoff et aL, 1980) and carnation mottle virus (Guilley et «/., 1985). 
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Fig. i . Nucleotide sequencing strategy for BNYVV RN A-l showing the alignment of the cDN A clones 
(solid bars) with respect to the final sequence. Regions sequenced by primer extension on RNA with 
synthetic oligodeoxyribonucleotide primers are indicated by arrows with the primer used in such 
analysis or in cDNA synthesis for cloning identified by number. The primers were complementary to 
the following residues of the final RNA-1 sequence : 1, nt 3845 to 3860; 2, nt 3293 to 3307 ; 3, rit 1527 to 
1545; 4, nt 1229 to 1250; 5, nt 622 to 642; 6, nt 269 to 278; 7, nt 109 to 129. pBF542 was produced with 
primer 3 but the clone did not extend back to the primer sequence. 



At their 3' termini we have already shown that extensive sequence homology exists. over the 
last 200 residues of RN A-3 and -4 (Bouzoubaa et a!., 1985) and over the last approx. 70 residues 
of RNA-2 (Bouzoubaa et al. y 1986). Fig. 4 depicts a sequence alignment for the 3' terminal 
regions of all four RNA. The region A of RNA-2 which is homologous to much of domain A of 
RNA-3 and -4 is also present on RNA- 1 with only three mismatches in 51 positions compared to 
RNA-2. Preceding this region on RNA-1 and -2 are two short homologous domains E and D and 
a longer region of homology F (Fig. 4). Regions A, D, E and F fall in the same order on RNA-1 
and -2 but the spacing between them varies considerably. There is also homology between 
domain E of RNA-1/2 and a portion of domain B of RNA-3/4 as well as between portions of 
domains F and C (Fig. 4), Both of these alignments are of only limited extent, however, 
involving 13 residues (with two mismatches) for the aligned pair E/B and 12 residues with one 
mismatch for the pair F/C, Thus while the 3' terminal sequences of RNAs 1 to 4 are clearly all 
related to one another the sequence homologies suggest that they may be further divided into two 
subgroups with RNA-l and -2 forming one pair and RNA-3 and -4 the other. 

For propagation, BNYVV is commonly transferred by mechanical inoculation from infected 
sugarbeet roots to leaves of Chenopodium quinoa. Isolates maintained in this host are often but 
not always found to lack one or both of RNA-3 and -4 or to contain deleted forms of these 
molecules (Richards et al y 1985 ; Kuszala et <?/., 1986; Burgermeister et a/., 1986). Thus the two 
smallest RNA are certainly not necessary for virus multiplication in Chenopodium quinoa. In 
infected sugarbeet roots, on the other hand, full length RNA-3 and -4 are always present (Koenig 
et aL, 1 986) suggesting that RNA-3 and -4 may be essential parts of the BNYVV genome under 
the natural conditions of soil transmission and multiplication in roots but not under the artificial 
conditions of propagation in leaves. Under the latter conditions, RNA-3 and -4 could persist as 
satellites, undergo deletion or disappear entirely from the inoculum. In this regard, the extensive 
3' terminal sequence homology noted above strengthens the view that RNA-3 and -4 are in fact 
part of the BNYVV genome since such homology is like that displayed by genomic RNA for a 
number of multi-component viruses (Davies & Hull, 1982) but is unlike the situation for known 
authentic plant satellite RNA, which generally have little if any sequence homology with the 
helper genome (Murant & Mayo, 1982). 

Coding capacity of RNAr J . 
RNA-1 contains a long open reading frame extending for 2109 codons from AUG(154), the 
first potential initiation codon in the sequence, to UAA(6481) (Fig. 2). No other open reading 
frame of more than 60 amino acids is present. The calculated M f of the translation product of 
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B ? G p pp AAATTCC ATTCTTCCC ATTCGCC ATC ATTGAATCATTACTCCTGT ACTGC AACCC AGTTAGC AGTGCCTCC AAAGC ATC1 TCTTTC AAAATAG ATTGCGA 

MADSFGFTPHEVLLFC 
1 0 1 AGTGACrrrcACCTAACACGACGTCCCTGlTrTACGACTTTTTACATAATCMCAT^ 

G E S V Q LLT3DMPI DVQHCFVY STRCYALWKDDL 
201 TGGTGAATCG<nTCAGTTGTTGACTTCAGACATGCCT 

I HLNPLLKYSQR I A K P W E K LVB GFVCPVPLDKL L 
301 ATTCATCIXrA^TCCGCTCTTGAMTATTCACAGCGAArrGCAAM 

3 L L AKLMRYCVNHG V 3 V 0. E I Y LSD A I V 8 3 3 Y M L 

4 01 TGTCTITGTlXKXrAAAACTl'ATGAGCTATTGTGTTA 

H V S R 3 A C C V 3 F 3 W L Y AK L3MFA3CCK M V C 3 3 H 

5 0 1 GCATGTGTCTAGCAGTGCAGGTTGTGTGTCTTTTAGTIX^^ 

HTA^NMIEGSRAV NGPDVAlSEHVEAFHLEVXSS 
601 CACACOC<rrcCTAACATGATTGAAGGTrC<XX}TGCTGTGAATG^ 

LVVTVSLTPREKK I LEREPGFVPLYKQ.K8RAPR 
701 CAC^XXTTCTAACIUTAlXreiGAtTCC^^ 

NHPVLAALREVHRO.EYSASCNILNT KLKTLVVG 
801 CAATCACCCAGTGCTGGCTGCrCTTCGTX^GG™ 

K K ACTC G iqAAOT TA A'^ 

L H~5 'A ~ L " K T K "Y R'H'M E S G ERE t B M D~ L K G~ C G Y I V K R S 
1001 TGTrcCATAGTCCTTKXXrrACAAAGTACra^ 

VENA 1 YEVVSDKDVAEVLRYAO.TVASTKKEAKR 
1101 GGTXX^AAAACGCTATCTATCAGGTTGTTTCIXATAA^ 

K PNT CKRKHVN5EATR RTI ELHELSR IVAEEK.K I 
1 201 AAACCTAACACTGGAAACCCAAAfcATGGTCATGTCCCAACCAACTCCT 

PNHFHFDESOFASVGNFTQLVCEDVGYNF SVDA 
1301 TrCCTAACCATTTCCACTTTGATGAAAGTGATTTrGCTrCTC^ 

WLH. LFEV TGAQTAVGYMALPHELLFEHYPIS'DY 
1401 TTQGTTGCATTrGTrCGACGTGACCGGTGCCCAC ACTGC^ 

Y DYWEGVERHGSLGG I T 1SPLRNCQVVGMPTGVF 
1501 TATG ATTAaTGGC AOGOT0TTCAAAAGCA1XJGTTC ATTGGGTGOT 

QPVHFDKTSAGLG1PGSKHCTAERVICHHSDGL 
1601 ITCAACCTGTTCATTITGACAAGACGTCTXXTG^ 

G NG Y NHVKSDWQTLLKHP 1 1* S S S K YNF AVEVDL 
1701 TGGAMlKX?CTACAATCATCTrAAMGTCA7TGGC>X!A 

TGRYGCLATFRLTRVTGVK YVASTIK LRPEDRYV 
1801 ACGGGACGTTATGCTrGTCTI^ACCTTIXXKTTAATO 

RVLDLLHTVRSIRLKGHAGLKEPYQYFPVYKRE 
1901 TraXXJTATIWATTrGTrACATATTGTGCGTAGTAnAG 

VDTTVSYC F 3 IAEKSLTVQN 1 ANF IRHHIGGVS 

2 00 1 GOTAGATACGACGGTGTCTTACTGTtTTTCTATTCCTGAA 

LVNKELVSAHRLNPOLVPSFAYAVYFYVVHLRCE 
2101 TTAGTTAACAAACAATTAGTATCGCanGGTOTroAATCCG^ 

LDGHLQKLMKKGITWADRLKANVSAFLRDMVDP 
2201 AGTTOXJATGGGATGTTOCAi^AGTTAATGAAAAAGCGTATCACATGCG^ 

1 S F 1 H THLFERRLVDQ 1 FQDCTDVFYO.MDRA CV 
2301 TATTAuri-i-iiTVrOGACCTGGTrATTO 

D J? K K L R L NDHIK I TRDFLPADTLLPEGHSLDDWE 
2401 GATGAGMGCCATTGCGTITGMCGATCACATTAAAA1TACACCAG 

„ A1 KAPDSL KTL SAAASLPVECGAVMCVGKSFKSVR 
2501 AAAAAGCTCCCGATAGTTTGAACACTCTITCGGCA^ 

f. g v v T S P V E 0 F F K3GGKFR0DAEFAELLS 
2601 TACTCTATTACCACCATCTGTTGTTACTTCACCTGTTC 

J. JL W C M D N - S - F C A CQVCAALTGK.TCSOVVEC RHK 
2701 GCGCACTATCGCIXXiCACATGGACAATTCTTITnTrc 

^S*J;.JL"* * F 8 H SOTEVDDFR BEIKAOS I EKGM R F C 
2801 AAGATGAGTCTATCTATACATTITCTATGTCACAAACTCA 

B t L I C__V_ HQKIPTQAFEV8VRL EY VKCGPGT GK8 
2901 TGAATTGCTAATAG<rrOTACATCAOAAMTrcCTACACAAGCCT^^ 

F L 1 RSLADPIRDLVVAPFIKLRSDYQNORVGDEL 

3 001 TTrCTTATAAGATCATTGGCCGACCCTATOGCCATCTrGTGOT^ 

L 3 HDFHTPHKALDVTGKQ I I FVDE FTAYDHRLL 
3101 TTCTITCITGGOACTTFCACACGCtTCACAAAGCATI^ 

A V L AYRRHAHTIYLVGDEOQTCrOECRGEGI SI 
3201 AX?CTCTCllt^XTrATAGAAATCATGCCCATACTATITAC^ 

~?^f^J™5.JLJL» K U 8 T » V _? IMNFRNPVHDVKVLH YLFCS 
3301 CTTAACAAAAlTGATTT(rrCTAAGGTrTCTACACATG^ 

„ , RMVPM88VEKGFSFGDIKEFSSLSN1PDTKIIH 
3401 CTCCTATX^mXTCTATGTCTimJTTGAAAAGGG^ 
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TCTTTGAAAATAGAITGCGA 

H E V L L F 0 
ATCGACCTTTTCCTTTTrcG 

. L H K D D L 
CTCTCTGCAAAGACGACTTG 

V P L D K L L 
ACTACCGCTTCATAAGTTCT 

V S 3 3 V M L 
CTTTCATCGTC1TATAT0TT 

* W V C S S. H 
TCTGCCTAGGCTCTTCACAT 

H L E V K 9 8 
•CCATTTACACCTGAACTCAT 

Q R 3 R A P R 
rAGAAAAGTAGACCTCCACG 

i K T L V V G 
TGAAAACACTCGTAGTTGGT 

V R T T L E L 
XSGTAAGGACCACTCTGGAGT 

G Y I V K R 3 
.'GGTTATA'ITGTCAAAAGGTC 

.* K K E A K R 
iCCAACAAAGAGCCGAAAAGG 

V A E E X X 1 
TCTAGCCGAAGAGAAGAAGA 

Y N F S V D A 
TACAATTTTTCTGTGGATGC 

I Y P I S D Y 
1ATTATCCAATATCAGATTAT 

G M P T G V F 
.TTGCTATGCCCACTGGGCTTT 

CH H 3 D G L 

iTGTCACATGTCAGATGGACT 

' A V E V D L 
TICCTGTrGAAGTGGATTTA 

R P E D R 7 V 
TCGTCCAGAAGATAGGTATG 

F P VY K R E 
TXTCCTGTGTATAAACGTGA 

[ H I G G V S 
'ATCACATTGCTOGTOTTTCT 

V V tl L R G E 
LTGTTGTTAAnTGAGAGGTG 

L R D M V D P 
TTGCGACATATCGTAGATCC 

"AGATOCATCOTOCTTGTCTT 

W 8 L D O H E 
JATGGTCnTACACCATTCGG 

K 3 F K 3 V R 

:aactcgtttaagagcgtgcg 

•: F A E L L 8 
lAGTTTGCAGAA'ITGTTGACT 

V V E C R H K 
JbGTXGTTSAATGCAGATGGA 

£ K G N R F C 
X5AAAAGOGAAATCG1TTTGG 

i P G T G K 9 
1GCCCTGGTACGGGTAAATCT 

0 R V G D E L 
iTCAACGAGTTGGTGATGAGC 

A Y D H R L L 
iGCCTACGAITCGCGnTACT 

i C E C I S I 
ttTGGAGAAGCAAlTICGATA 

L N Y L F G 8 
ATTAAATPATCTCnTGCGT 

P D T K I I H 
CCAGACACTAAAATCATTCA 
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Y 3DETGEHHHPDYVRGV8 KTTVR AS fl OSTYBH V 

VLPVLP8DLKLINSA ELHLVALSRHR NKLT I L L P 
GTTTrCCCTGTTTTGrcATCTGAriTGAAACTG^ 

N DGM N I G AVLKGMLEGVPEELER RDY 1VOMY L C 
ACAATCATCCTATX^TATTGClXJCTGTmCAAA^ 

LHLP I KKEFFFPES EFAKSFRLHVAKYEAFVPY 
GmCATTTACCTATTAAQAAAGAGTrCTTCTra 

D_3__D_L_jR_ T „L_ V 3 Q C U VVVLD I A R VEMDI M D A F D C A D F 

GATAGTGATCTGXXAACTTTCGTTrcACAAGGAGAT^ 

3RPNNCLVVAI 3ECLGVTLEK LDNLMQ/A N 
CCTAAC^TTGTTTAGTGCTAGLTATTOAGAATGTrrAGCTOT 



Y N L . 
TCTATAATI 

AVTLDKYHAHLSXKSPSTWQDCR U F A D A L K V S M 
CGCTGTGACGCTIX^TAACTATCACXJCGTGGTrc 

Y VK V L 3DKPYDLT Y EVOGAG S S V T L Y L T G K £ 3 D G 
TACCTTAAGOTTTTATCTGACAAACCTTACGATTTMOT 

HFIA APLSSSLSTN ERESGDNSKKpA DDS DT F D 
GGCATnTATTGCTGCCCCACTTAGTTCGTCGCTTKrrACCAATCA 



AANLFADKGVSS 
TGCTGCrAATTOTTISCTGACAAGGGTGT 



DMEAFCA Y LEKTLHAT I M K Y 
1ATATGGAGGCTITITCTGCCTATTTACAAAACACTITAATGGCAACAATTATGAAGTAT 



DLSLQ3WANVVDDTDDFY0/ INI.SEFRQSTC F G K L 
GATntJAOCTTOCAATCOTGaXTAATGTOOTTG^ . 

L S A L EV L KV DV S R K RF I 3 D HLCK N L EMK Q F R N.R 
TCTTCTCAGCCCTTGAJUMTnCAAGGT^ 

H 3 3SVASAS S A G 3 HVDDDFVNMA C G XTDARADP 
TTGCTCTTCTAGTCTGGCTTCAGCT^^ 

ADVLR QSF MDYAS EFVP I LIA ESP I FHPLVEPEP 
COTCATCZmTGAGGCAGACTTm'rGGOT 

1 L 3 X CHVPEFDAFLL IXEF DLDNG AD EYQ C A Y L 
CAATATTOTCCAAGl<rrATCCTGCCTCAGTrTGACGCU 

NESVANR 1 GDKFVSGVLDTDI ISPLNLRGHFIA 
TAATGAATClXnTGCTAATCGTATTCXrra^ 



ENVKYHSftCVAPAO 1YFKR 
GAAAATGTrAAATATCACAGCATGTGTGTWCCCCGCCTCACATATAlTrTAAGCGCj 



tfQ-HQELQVQQARYLF 

lAATCACTGCCAACAATTGCAGCrTCAACACGCTCCCTACTTAT 

RKVRNSPS.STfiD.SVARMVAOL, FV 3DCLVPMVAD 
TTCGAAAACTGAGAAArrCTCCATCATCGACACAAGATAGTGTTGCACCrrATGCTTGC^ 

TFSASHLHRIMDKAMHDHVAKNYQGQM. EEEFTR 
TACTTTTTCTGCTTCCAATTTGTCGrcAATTAra 

N A X LY RFOLKO'lEK PL KDPETD LA K A C Q G 1L AW S 
AATGCTAAACTATATXrGTTTrcAGTTGAAGGATATTGAAAMCCTTTC 

KEAHVKFHVAFRVl.N0LLLKSLN.5NV V Y D if T R 3 
CTAAGGAOOCACATXOTMGTTrATGGTTGCITrTAGAGTTrTAAA^ 

ETEFVGK 1 NAAMHTVPDSA 1 NGV I DAAACDSG Q 
TGAOACCGAATTTCTTGC7UAAATAAATCCOGCCATCAA 



CVFTQLIERHIYAALCISDFF 
GGGCTITrcACCCAArrcATAGAAAGACATArrTATGCTCCITrGGGC 



0 N Y F 3 F R E X Y V H 
ATTGGTATTICTCATTTCGIOAGAAATATGTTA 



0 S RYVRAHMSYVKTS GEPGTL LGNT I L M G A M L N 
TGCAGTCCAGATATCrrCAGAGCACATATCrrCTrATGTTAAGACTAGT^ 

AMLRGTGPF CMAMK G D D C F K. R Q, A N L K I WD QMLK 
TGCTATGCTTCGTGGGACCGGACC ATTTTGT ATCGC CATG AAGGGCGATGATGGTTTTAAAAGGC AGGCTAATTTCAAAATTAACGA^ 

L I KKETVLDFKLDLNVPI TFCGYALSKGH L F P 3 V 
TTGAirAAAMGGAAACTGTCTIXXJATITCAAATTGCAT^TAAATCTTCCTATCAt' IVmolU* J 1 ATGCTITATCrAATGGAC AiriUITlVCAACTG 

8RKLTKI AAHftFREYKH FC EYQESLRDHIKK L P 
TTTCGCCTAAAlTGAXXAACATACCAGCACACAGCrrrCCGTCAGTATAAG^ 

KDPAV YADFLECNASLSCRNV DDVQRN LD A 1 I 3 
CAAAGACCCAGCTGTrTATtKrrCATTTmGGAGTGTAA 

V 8 R I 0 R EQ FMHMFP I REV F M.S. LP PV ED 3 L G E L 3 3 
GTOTCTCGAATXXKXXrGTGAGCAATTrATCATCATG^ 

TKVAVS I GDKVSNVVRKVARVD MKK F * 

CTACCAAAGTGOCTOTCTCTATTGGCGACAATGm 

TAATACOlTGTACACTIGTGACTAGTATAAGTmAAAATAAA 

CTenTTTAITATTGTITOAGTroCTTAlX/nX^ 

TCAATTGTACCAGTCATTAAAGGCTTTACTATCAGTATATTGATAT 6746 

Fig. 2. Nucleotide sequence of BNYVV RNA-l. The sequence is written as DN A and the amino acid 
sequence of the 237K open reading frame (see text) is indicated in one-letter code above the sequence 
with an asterisk denoting a termination codon. The 3' terminal poly(A) tail is not shown. 
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RNA 

1 cap-AAAUUCGAUUCUUCCCAUUC6CCAUCAUU6 

2 cap-AAAUUCUAACUAUUAUCUCCAUU6AAUA2A 

3 cap-AAAAUUCAAAAUUUACCAUUACAUAUU66U 

4 cap-AAAUC AAAUCUCAAAUAUAUAUUUG UAUUU 

Fig. 3. Sequence at the 5' terminus of the four BN Y VV RNA. A filled circle has been placed above each 
G residue. 



this long open reading frame, assuming translation begins with the first AUG, is 237389 (237K) 
The coding capacity of all four BNYVV RNA is depicted in Fig. 5. 

Computer-assisted sequence comparisons with proteins encoded by several RNA viruses 
reveal domains of homology common to non-structural proteins thought to be involved in viral 

lSi^S^ ( S^° ffrtfl/ '' 19841 Kamer&Ar B° s » 1984; Ahlquisiefir/., 1985; Ali Rezaian<?f a/., 
1984, 1985). The BNYVV 237K polypeptide also contains sequences of this sort. Fig. 6(a and b) 
shows portions of the TMVP126 protein and the overlapping readthrough protein P183 aligned 
with corresponding parts of the BNYVV 237K polypeptide. The location of the homologous 
domains on the genetic map of each virus is given in Fig. 6 (c). Note that the regions shown (Tl 
and T2 of TMV, Bl and B2 of BNYVV) are those of greatest homology; the alignments can be 
extended m both directions from these core sequences although with less confidence. Haseloffe/ 
ai. (1984) and Ahlquist et aL (1985) have aligned the TMV Tl and T2 regions with the RNA-1- 
and -2-encoded proteins of the tripartite viruses AMV and brome mosaic virus (BMV) and with 
portions ol the Sindbis virus p270 non-structural polyprotein. Kamer & Argos (1984) have 
extended the set of homologies characteristic of the readthrough portion of the TMV PI83 
protein (T2) to include cowpea mosaic virus and the picorna viruses. These comparisons provide 
a consensus alignment in which certain residues are conserved in virtually all members of the 
nvivi?; 6 Sh<nyS that these conscn " 8US residues are with few exceptions also present in the aligned 
BNYVV sequence, thus suggesting involvement of the BNYVV 237K polypeptide in viral RNA 
replication and a certain generality as to the mechanism. It should be noted, however that 
homology also exists between the N-terminal regions of the Sindbis virus P270 polyprotein ' 
TMV P126 and the AMV and BMV RNA-2 proteins (Ahlquist et aL, 1985). A counterpart of 
this domain in the BNYVV 237K polypeptide, if it exists, is below the level of homology readily 
detectable by the search methods used. 

Nothing is yet known concerning the translation strategy of BNYVV RNA-1 in vivo but in a 
previous cell-free translation study using rabbit reticulocyte lysate, RNA-t was found to direct 
synthesis of 50K and 150K polypeptides as well as lesser amounts of an approx. 200K species 
(Ziegler et aL, 1985). In view of the sequence results it appears probable that these three 
polypeptides all have the same N terminus with the shorter products arising by premature 
termination or nbosome 'stalling' (Lindhout et aL, 1985) although alternative interpretations 
such as internal initiation cannot be conclusively ruled out. When translated in a wheat germ 
system, RNA-l directs synthesis of comparable amounts of two very long polypeptides, one of 
about 200K and one which is slightly longer (data not shown). The latter species could 
correspond to translation of the entire RNA-I open reading frame. 

Homology between BNYVV RNA-2 and barley stripe mosaic virus RNA-fi 

Barley stripe mosaic virus (BSM V) is a tripartite virus for which, like BNYVV, synthesis of 

m«f £u ein 15 direCtCd by the secon(} largest eenomic RNA (RNA-/?) (Gustafson & Armour 
1986). The complete nucleotide sequence of RNA-0 from the type strain of BSM V has recently 
been reported (Gustafson & Armour, 1986). The sequence contains four open reading frames 
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RNA-1 6533 TTTAAAATAAATA . . . AAGnrCATGCrACAWCCTCC'IWfXTT^AT 

RNA 2 4450 TTTAATArTAATAATT^ 

' 1A , i in m" "mm 'i it if ii ii n 
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„ vt . , -* £ 
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| j I | | I 1 t I I | 1 | | | 

RNA-2 4500 .cGAAncrmn^tccT 

HUH H1H 
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-* . B 
D 
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I I j I | I l II | II I I I I HI 111! 

RNA-2 4 r >^ . TTATArimm ATTGAT (lACTGATTTTtATTAA 

" iiii-II I I 
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iii n n j i '_m ' 

R N A-4 1*J t *< TTATATCGTATAA A'l ATATATAAG 
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I I I I I I i I t ! i I t i i ! 1 1 l ) M II t ! I 1 I t ! t I 1 I I I I t I 

RNA-2 ^48 [.^TTT.C AGTtr! *Ai*TT.AC i> ,a TTT/nViA'n XiTACrAt'JTf OATGTAGC. 

m i i i i i i i t i : i m : i : i ■ I i i 1 1 i : ii I I I i ( t i I I M 

RNA-1 i704 aTAAGClTAAOATr/rACTP.MntXXJTGTr.A^ 

t [ M l ! : i I l | I I • : ! M I M H t I I I I I I t ! I I I | H i ; i II ] I I i 

R N A-4 U9 7 TTAAf :()TTGACATfri*Af TT.ACItlGGiri'nyVAAl OTACi-AGTCTrTArAGG 

♦ ■ A 

RNA-1 *»va4 ctttactJtcactata'i^atat- poIy(A) 

M l I i » I HI t = > f « ■ 
RNA-2 4594 (rPL-fGlTTTCAGTATATTG-poly(A) 

ii n iiitjMiiH — 
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Fig. 4. A BESTFIT alignment of the 3' terminal sequence of the four BNYVV RNA (a) and a schematic 
representation of the 3' terminal region (6). The homologous blocks of sequence denoted by letters in (a) 
are stippled and identified in (b). 
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-J 1 I 



237K • hfAAA 



2 HCP? 54K | (13] [hU AAA 

t 42* r tl5| 



|AAA 4 — I 31K t miAAA 

Fig. 5. Genetic maps of the four BNYVV RN A. Open reading frames are depicted as hollow rectangles 
with the coding capacity of each indicated within. CP refers to the 21 K viral coat protein. The small 
arrow above RNA-2 represents a leaky termination codon (Ziegler et at. t 1985). Solid bars represent 3' 
terminal homologous sequences common to all four RNA and the hatched bars the position of 3' 
terminal homologous sequences found only in RNA-3 and -4. 
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Fig. 7. Graphic homology comparisons of (a) BSMV 58K and BNYVV 42K proteins and (b) BSMV 
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limits of the homologous regions. 



084 
985 

943 
1044 

1001 
1098 

1059 
11S7 



1381 



1440 
1894 



which code for viral coat protein, and polypeptides of 58K, 14K and 17K. The coat protein 
cistron lies nearest the 5' terminus of the RNA, the cistron for the 58K polypeptide occupies the 
central portions of the RNA and the 17K coding region is nearest the 3' end (Fig. 8r); The 
cistron for the 1 4K polypeptide overlaps the 3' end of the 58K cistron, the 5' portion of the 1 7K 
cistron and the region which separates them (Fig. 8 c). 

In view of the superficial resemblance in genetic organization between BNYVV RNA-2 and 
BSMV RNA-/?, homology dot plots were generated for all possible pairwise combinations of the 
proteins encoded by the two RNA. Such analysis reveals extensive amino acid homology 
between the BNYVV 42K polypeptide and the C-terminal half of the BSMV 58K polypeptide, 
and between the BNYVV 1 3K and BSMV 14K proteins (Fig. 7). Detailed alignments are shown 
in Fig. 8(a) and (6). These alignments are statistically highly significant according to the 
empirical test of Doolittle (1981). The quality scores for the 42K/58K (Fig. %a) and 13K/14K 
(Fig. 86) alignments are, respectively, 17-7 and 24*5 standard deviations above the means of the 
quality scores obtained by reiterative comparisons of the same sequences after randomization. A 
difference of more than 3 standard deviations in this test is normally accepted as indicating 
statistical significance (Doolittle, 1981). 

No information is yet available for either virus concerning the roles these homologous 
polypeptides might play in the virus multiplication cycle but the very existence of such 
homologies provides evidence that the open reading frames in question do in fact correspond to 
expressed genes. Such assurance is particularly comforting for the short 13K BNYVV and 14K 
BSMV coding regions. The homologies would also seem to indicate that BNYVV and BSMV are 
more closely related than hitherto suspected. 



Fig. 6. BESTFIT alignments of regions Tl and Bl (a) and T2 and B2 (b) of the TMV and BNYVV 
RNA-1 targe non-structural proteins. Identical aligned amino acids are indicated by vertical lines and 
favoured substitution [score 0-5 in the normalized log odds matrix (see Methods)] by two dots. Single 
dots indicate alignments of S and T. Closely similar alignments were generated using a direct identity 
scoring matrix. In (a) asterisks above the sequence refer to residues conserved in the alignment of the 
TMV sequence with that of TMV, AMV, BM V and Sindbis virus (Ahlquist et a/., 1985). In (b) double 
asterisks denote strictly conserved residues and single asterisks conserved familial residues in the 
alignment of Kamer & Argos (1984). The underlined sequences indicate sequence motifs conserved in 
the sequence of the putative repficases of all plus-strand RNA viruses which have been sequenced to 
date (Guilley et at., 1985 and further observations), (c) Genetic organization of TMV RNA and 
BNYVV RNA-1. The positions of the homologous domains Tl and T2 on TMV and Bl and B2 on 
BNYVV RNA-1 are indicated. The small arrow represents the leaky termination codon whose 
readthrough gives rise to PI 83 (Pelham, 1978). 
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BNYVV 83 LLGELGKNCDLTCNAAAV/K r.nrr.QKVK'TS SDVfPANVG 1 VfXAPGVGK STS 
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BSMV 324 . DLU I DEYTrjVESAEI Lf JjORRLRASMV^ LVGiWaOGKATTASS I EYLT 
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BSMV 373 LPV I Y R HKTTYRLCQ RT AS [ ,f'i5K.ftGM aMUSKCGRDfTVI ITDVDGBT 
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Fig. 8. BESTF1T alignments of portions of the BNYVV 42K protein with the BSMV 58K protein (a) 
andoftheBNYVV 13K protein with the BSMV 14K protein (/>). Symbols for amino acid matches are as 
in Fjg. 6(c). (c) Genetic maps of BNYVV RNA-2 and BSMV RNA-/J. The homologous regions shown 
in (a) are shaded. The small arrow above the BNYVV map represents a leaky termination codon 
(Bouzoubaa et al., 1986). The star in the BSMV map indicates the position of the tRNA-like structure 
following the internal poly(A) tract (Gustafson & Armour, 1986). 



CONCLUSIONS 

With this report, BNYVV becomes the eighth plant RNA virus for which complete sequence 
information is available. In genome organization BNYVV shares features with certain other 
plant viruses while other properties set it apart. In particular, BNYVV joins the tobraviruses 
(Harrison & Robinson, 1978), soil-borne wheat mosaic virus (SBWMV) (Hsu & Brakke, 1985) 
and Indian peanut clump virus (IPCV) (Mayo & Reddy, 1985) in expressing its coat protein as a 
primary translation product of its second largest RNA. In both BNYVV (Ziegler et aL, 1985) 
and SBWMV (Hsu & Brakke, 1985) the coat protein cistron can be read through to produce 
longer polypeptides. BNYVV RNA are 3' polyadenylated, however, but those of SBWMV, 
IPCV, BSMV and the tobraviruses are not (Harrison &. Robinson, 1978; Kozlov et al, 1984; 
Hsu & Brakke, 1985; Mayo & Reddy, 1985). Furthermore, SBWMV, IPCV and the 
tobraviruses have bipartite genomes (Harrison & Robinson, 1978; Shirako & Brakke, 1984; 
Reddy et at. 1985) but, as noted above, the genome of BNYVV may well consist of four 
components, at least when propagated in its natural host. 
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We have shown in the preceding section thai there are similarities in overall genetic 
organization between BNYVV RNA-2 and RNA-/J of BSMV as well as homologies at the 
amino acid sequence level. The other RNA components of BSMV, however, differ significantly 
from their counterparts in BNYVV : (i) BSMV RNA-ot, although it resembles BNYVV RNA-1 
in directing synthesis of a long polypeptide (Dolja et aL, 1983), is only about half the length of 
BNYVV RNA-1 ; (ii) BSMV RNA-y is dicistronic, encoding 75K to 85K and 1 7R polypeptides, 
the latter expressed from a subgenomic RN A (Gustafson et al. y 1981 ; Dolja et al rf 1983 ; Jackson 
et al. y 1983). Thus BNYVV appears to possess a unique spectrum of characteristics 
distinguishing it from all known groups of plant RN A viruses. 
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